Tag
1 article
OpenAI plans to retire the SWE-bench Verified benchmark, citing flaws that undermine its validity as a coding performance measure. The move highlights concerns about memorization in AI model evaluations.